<!DOCTYPE html>
<html>
<head>
<title>Common Crawl Index Server</title>
<link rel="stylesheet" href="/static/__shared/shared.css"/>
</head>

<body>
<h2>Common Crawl Index Server</h2>

<a href="https://github.com/commoncrawl/cc-index-server"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://camo.githubusercontent.com/38ef81f8aca64bb9a64448d0d70f1308ef5341ab/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f72696768745f6461726b626c75655f3132313632312e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png"></a>

<p>
Please see the <a href="https://pywb.readthedocs.io/en/latest/manual/cdxserver_api.html#api-reference">PyWB CDX Server API Reference</a> for more examples on how to use the query API (please replace the API endpoint <code>coll/cdx</code> by one of the API endpoints listed in the table below). Alternatively, you may use one of the command-line tools based on this API: Ilya Kreymer's <a href="https://github.com/ikreymer/cc-index-client">Common Crawl Index Client</a>, Greg Lindahl's <a href="https://pypi.org/project/cdx-toolkit/">cdx-toolkit</a> or Corben Leo's <a href="https://github.com/lc/gau">getallurls (gau)</a>.
</p>

<p>
<a href="https://commoncrawl.org/the-data/">Common Crawl data</a> is stored on <a href="https://aws.amazon.com/public-datasets/common-crawl/">Amazon Web Services' Public Data Sets</a>. All data and <a href="https://data.commoncrawl.org/cc-index/collections/index.html">index files</a> are free to download &mdash; run your own index server or analyze the index offline!<br/>
Please do not overload the URL index server for bulk downloads (e.g. all records of the entire .com top-level domain), see the <a href="https://groups.google.com/d/msg/common-crawl/3QmQjFA_3y4/vTbhGqIBBQAJ">download instructions</a>. Alternatively, check the <a href="https://commoncrawl.org/2018/03/index-to-warc-files-and-urls-in-columnar-format/">columnar index</a> which allows for efficient aggregations and filtering on any field/column.<br/>
More information about this URL index is found in our <a href="https://commoncrawl.org/2015/04/announcing-the-common-crawl-index/">announcement of the Common Crawl index</a>. For help and support, please visit the <a href="https://groups.google.com/forum/#!forum/common-crawl">Common Crawl user forum</a>.
</p>

<p>
Currently available index collections (also as <a href="/collinfo.json">JSON list</a>):
</p>
<table class="listing">
<thead>
<tr>
  <th>Search Page</th>
  <th>Crawl</th>
  <th>API endpoint</th>
  <th>Index File List on<br/><code>s3://commoncrawl/</code></th>
</tr>
</thead>
<tbody>
{% for route in routes | sort(reverse=True, attribute='path') %}
  {% if route | is_wb_handler %}
  <tr>
    <td style="font-size: larger">
      <b><a href="{{ '/' + route.path }}">{{ '/' + route.path }}</a></b>
    </td>
    <td>
      {% if route.user_metadata.title is defined %}
      {{ route.user_metadata.title }}
      {% endif %}
    </td>
    <td>/{{ route.path }}-index</td>
    <td><a href="https://data.commoncrawl.org/crawl-data/{{route.path}}/cc-index.paths.gz">{{route.path}}/cc-index.paths.gz</td></td>
  </tr>
  {% endif %}
{% endfor %}
</tbody>
</table>

<p style="margin-top: 200px">
Powered by <a href="https://github.com/webrecorder/pywb">pywb</a>
</p>

</body>
</html>
